Add trim_galore test fixtures (for MultiQC#3538) by FelixKrueger · Pull Request #377 · MultiQC/test-data

FelixKrueger · 2026-04-27T12:38:55Z

Companion PR to MultiQC/MultiQC#3538, which adds a native MultiQC module for Trim Galore v2.x.

This PR drops the test fixtures the new module needs into data/modules/trim_galore/. Real Trim Galore v2.1.0-beta.5 JSON outputs:

sample_R1.fastq.gz_trimming_report.json — single-end (10K Illumina, long adapter-length distribution including the 1-bp tail typical for the default --stringency 1)
BS-seq_10K_R{1,2}.fastq.gz_trimming_report.json — paired-end pair (BS-seq 10K, with pair_validation populated and short adapter-length tails — covers the PE code path)

Schema reference: schema_version: 1, documented in the upstream MultiQC issue thread.

The MultiQC PR's test_modules_run.py::test_all_modules[trim_galore-…] and test_ignore_samples[trim_galore-…] checks fail until this PR merges (they look for test-data/data/modules/trim_galore/). Happy to coordinate merge order — most natural is to merge this first, then unblock the MultiQC PR's CI.

Test fixtures for the new MultiQC `trim_galore` module proposed in MultiQC/MultiQC#3538. These are real Trim Galore v2.1.0-beta.5 outputs: - sample_R1.fastq.gz_trimming_report.json — single-end (10K Illumina, long adapter-length distribution including the 1-bp tail typical for --stringency 1 default) - BS-seq_10K_R{1,2}.fastq.gz_trimming_report.json — paired-end pair (BS-seq 10K, with `pair_validation` populated and short adapter-length tails — useful coverage for the PE code path) Schema reference: schema_version 1, documented in the upstream issue thread at MultiQC/MultiQC#3529.

The fixtures originally landed in `tests/data/modules/trim_galore/` of the main repo, but `test_modules_run.py` resolves test data via `<repo>/test-data/data/modules/<module>/` (a separate sibling repo, MultiQC/test-data). Moving them there in MultiQC/test-data#377.

ewels

These are still up to date, post-release right?

Nit: Please could you include the associated log files as well? I want to make sure that we don't show sections for both TrimGalore and Cutadapt together, so it would help to test for that blocking effect.

FelixKrueger · 2026-05-11T12:41:58Z

GSM7431887_NB18_32F_TNTtoKSR_553_rep3_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz_trimming_report.json
GSM7431887_NB18_32F_TNTtoKSR_553_rep3_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz_trimming_report.txt
GSM7431887_NB18_32F_TNTtoKSR_553_rep3_Homo_sapiens_Bisulfite-Seq_R2.fastq.gz_trimming_report.json
GSM7431887_NB18_32F_TNTtoKSR_553_rep3_Homo_sapiens_Bisulfite-Seq_R2.fastq.gz_trimming_report.txt

* Add native MultiQC module for Trim Galore v2.x (Oxidized Edition) Closes #3529. Trim Galore v2.x emits a structured `*_trimming_report.json` (schema v1) alongside the legacy `*_trimming_report.txt` report. The text report still carries the `"This is cutadapt"` shim for backwards compatibility, so the existing `cutadapt` module path keeps working unchanged. This new module parses the JSON natively, which: - Gets the Software Versions table right ("Trim Galore X.Y.Z" instead of the misleading "Cutadapt 4.0" backwards-compat shim) - Surfaces TrimGalore-specific stats not available from Cutadapt output (RRBS truncation counts, poly-A/G trimming, paired-end pair-validation outcomes — the latter two are wired through to the data file but not yet plotted; happy to add follow-up sections) ## What's plotted - General stats columns: % adapter, % pass, % q-trimmed, total reads (hidden), total bp written (hidden) - Filtered reads bargraph: passing / too_short / too_long / too_many_n / discarded_untrimmed - Adapter length distribution linegraph (per sample, per adapter when a sample has more than one) ## Sample-name handling PE TrimGalore reports list both R1 and R2 in `input_filenames` (both JSONs do — Trim Galore preserves the pair context). The parser uses the JSON's `read_number` field to pick the correct filename, so R1 and R2 become distinct samples. ## Coexistence with the cutadapt module Both modules will discover their respective files (text vs JSON). With both enabled, each sample appears in both modules' general-stats columns. Users who want to disable the cutadapt path on TrimGalore samples can: ```yaml disable_modules: - cutadapt ``` Documented in the module's class docstring. ## Test fixtures `tests/data/modules/trim_galore/` contains: - `sample_R1.fastq.gz_trimming_report.json` — SE example (10K Illumina) - `BS-seq_10K_R{1,2}.fastq.gz_trimming_report.json` — PE example (BS-seq 10K, with `pair_validation` populated) Verified locally: `multiqc -m trim_galore tests/data/modules/trim_galore/` produces 3/3 reports parsed (1 SE + 2 PE), all sections rendered, data file written to `multiqc_data/multiqc_trim_galore.txt`. ## Schema reference JSON schema v1 is documented in the upstream issue thread (linked in the issue body). The parser version-gates on `schema_version: 1` and warns + skips files with a different version, so a future schema bump won't silently misparse. ## Status Marking as draft. Initial scope is intentionally focused — happy to extend for poly-A/G trimming sections, pair-validation visualisation, RRBS-specific stats, or anything else the maintainers want before a final review pass. * Apply prettier + ruff format from prek hooks * Move test fixtures to MultiQC/test-data fork (companion PR) The fixtures originally landed in `tests/data/modules/trim_galore/` of the main repo, but `test_modules_run.py` resolves test data via `<repo>/test-data/data/modules/<module>/` (a separate sibling repo, MultiQC/test-data). Moving them there in MultiQC/test-data#377. * Address PR review feedback on trim_galore module - Move write_data_file to end of __init__ and flatten payload to scalar columns so multiqc_trim_galore.txt is machine-readable - Call add_software_version unconditionally; bail on ignored samples inside the parse loop - Drop module-level docstring and unicode divider comments per project style; tone down class docstring - Drop redundant _strip_fastq_suffix helper in favour of clean_s_name - Add SampleGroupingConfig so PE pairs collapse cleanly with table_sample_merge (weighted-average percentages, sum counts) - Remove hardcoded bargraph colours; use uniform composite keys in the adapter-length plot and continue on zero-adapter samples - Drop % Q-trim precision to {:,.1f}; surface tg_total_reads by default - Bump schema_version mismatch to log.error with explicit guidance - Simplify search_patterns.yaml (drop contents/num_lines shim) - Group trim_galore adjacent to cutadapt in config_defaults.yaml - Revert CHANGELOG.md entry (generated from PR titles) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Log debug message when JSON tool field is not Trim Galore Helps diagnostics if a non-TrimGalore JSON happens to match the filename glob. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Auto-suppress cutadapt module for Trim Galore v2.x text reports The cutadapt module's text-report pattern matches any file containing "This is cutadapt", which also catches the backwards-compatibility shim that Trim Galore v2.x writes alongside its native JSON report. Result: every v2.x sample shows up twice — once via cutadapt (as a misleading "Cutadapt 4.0"), once via trim_galore. Telling users to disable cutadapt globally also kills parsing of genuine cutadapt logs and legacy Trim Galore v0/v1 reports, so it isn't a real fix. Add an exclude_contents_re to the cutadapt text-report pattern matching "Trim Galore version: " followed by a major version of 2 or higher. v0.x / v1.x text reports continue to be picked up by cutadapt; v2.x text reports are skipped (the sibling JSON is handled by trim_galore); pure cutadapt logs are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add Pair Validation, Poly-A/G, and RRBS sections to trim_galore Surface the schema-v1 fields that were already in the data file but not plotted: pair_validation, poly_a_trimming, poly_g_trimming, rrbs. Each is a small table with sensible gating. - Pair Validation: collapses R1/R2 (pair-level data is identical between them), drops rows where less than 0.1% of pairs were affected. - Poly-A/G and RRBS: per-row gating, samples with zero counts are hidden. - All three sections show a Bootstrap alert listing dropped samples, with long lists wrapped in <details> (bases2fastq pattern). - Defensive try/except around length_distribution int-coercion so a malformed key downgrades to a debug log rather than crashing the run. - Data file flattening extended to all of pair_validation, poly_a_trimming, poly_g_trimming and rrbs blocks (25 columns total). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add support for sample grouping * Add explicit_groups for deterministic tool-derived sample grouping Extend SampleGroupingConfig with an `explicit_groups` parameter that lets modules supply their own ground-truth groups instead of relying on the user's `table_sample_merge` name patterns. Useful when the tool output already tells you which samples are related — paired-end trimmers that emit both filenames in each report, lane manifests, replicate IDs, etc. The framework silently ignores entries with a single member so callers don't need to filter them out themselves. Wire trim_galore to use this. Each JSON's `tuple(input_filenames)` is a stable pair key (byte-identical between R1 and R2 of the same pair). Auto-grouping applies to: - General Stats table — framework path with expand-to-see-individuals - Pair Validation table — manual collapse keyed on the same pair_key Filtered Reads bargraph, Poly-A/G and RRBS tables stay per-read because R1 and R2 stats there can legitimately differ. Users with `table_sample_merge` configured layer name-pattern grouping on top of the auto-derived pairs. The `trim_galore_config.auto_group_pairs: false` flag opts out of auto-grouping entirely. Replaces the earlier `_apply_grouping` helper that relied on `config.table_sample_merge` to pre-aggregate filtered_reads / poly / RRBS — those now stay per-read regardless of grouping config. Docs updated: developer guide gets a worked example for module authors with authoritative pair info; user-facing customisation page describes the auto-grouping behaviour and the opt-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Simplify trim_galore module after review pass - Store pair_key_to_samples and pair_display_by_key on `self` so _pair_validation_plot and _derive_auto_groups drop their extra parameters - Add a small _add_filtered_section helper that wraps the plot+description+alert+add_section pattern, collapsing three near-identical 11-line blocks - Simplify the gen_stats type annotation from a quadruple Union workaround to `Dict[str, Dict[ColumnKey, Any]]` plus a single `cast(Any, ...)` at the addcols call site - Drop the unused `Union` import that fell out of the above - Tighten narrative comments per CLAUDE.md (keep only WHY) Net: -30 lines, no behaviour change. Lint / mypy / module tests all clean across the three grouping scenarios (default, with table_sample_merge, and opt-out). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Bit of clean up * Manual review of docs * Schema version: assume semver, only throw error on major version bump * Remove some excessively cautious code * Make code way less defensive still. If the data is that badly mangled, I'd rather it throw an exception instead of silently default to fake numbers * Better docstring / docs * Tidy up descriptions / helptext a bit2 --------- Co-authored-by: Phil Ewels <phil.ewels@seqera.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

FelixKrueger mentioned this pull request Apr 27, 2026

Add native Trim Galore v2.x module (closes #3529) MultiQC/MultiQC#3538

Merged

ewels reviewed May 11, 2026

View reviewed changes

ewels merged commit cd690cb into MultiQC:main May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trim_galore test fixtures (for MultiQC#3538)#377

Add trim_galore test fixtures (for MultiQC#3538)#377
ewels merged 1 commit into
MultiQC:mainfrom
FelixKrueger:add-trim-galore-fixtures

FelixKrueger commented Apr 27, 2026 •

edited by ewels

Loading

Uh oh!

ewels left a comment

Uh oh!

FelixKrueger commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FelixKrueger commented Apr 27, 2026 • edited by ewels Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ewels left a comment

Choose a reason for hiding this comment

Uh oh!

FelixKrueger commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FelixKrueger commented Apr 27, 2026 •

edited by ewels

Loading